44 research outputs found

    Central Limit Theorem with Exchangeable Summands and Mixtures of Stable Laws as Limits

    Full text link
    The problem of convergence in law of normed sums of exchangeable random variables is examined. First, the problem is studied w.r.t. arrays of exchangeable random variables, and the special role played by mixtures of products of stable laws - as limits in law of normed sums in different rows of the array - is emphasized. Necessary and sufficient conditions for convergence to a specific form in the above class of measures are then given. Moreover, sufficient conditions for convergence of sums in a single row are proved. Finally, a potentially useful variant of the formulation of the results just summarized is briefly sketched, a more complete study of it being deferred to a future work

    Non-asymptotic approximations of Gaussian neural networks via second-order Poincar\'e inequalities

    Full text link
    There is a growing interest on large-width asymptotic properties of Gaussian neural networks (NNs), namely NNs whose weights are initialized according to Gaussian distributions. A well-established result is that, as the width goes to infinity, a Gaussian NN converges in distribution to a Gaussian stochastic process, which provides an asymptotic or qualitative Gaussian approximation of the NN. In this paper, we introduce some non-asymptotic or quantitative Gaussian approximations of Gaussian NNs, quantifying the approximation error with respect to some popular distances for (probability) distributions, e.g. the 11-Wasserstein distance, the total variation distance and the Kolmogorov-Smirnov distance. Our results rely on the use of second-order Gaussian Poincar\'e inequalities, which provide tight estimates of the approximation error, with optimal rates. This is a novel application of second-order Gaussian Poincar\'e inequalities, which are well-known in the probabilistic literature for being a powerful tool to obtain Gaussian approximations of general functionals of Gaussian stochastic processes. A generalization of our results to deep Gaussian NNs is discussed.Comment: 20 pages, 3 figure

    Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

    Full text link
    There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.Comment: 20 pages, 2 figure

    Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

    Get PDF
    There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting

    Stable behaviour of infinitely wide deep neural networks

    Full text link
    We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions. Then, we show that the infinite wide limit of the NN, under suitable scaling on the weights, is a stochastic process whose finite-dimensional distributions are multivariate stable distributions. The limiting process is referred to as the stable process, and it generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs (Matthews at al., 2018b). Parameters of the stable process can be computed via an explicit recursion over the layers of the network. Our result contributes to the theory of fully connected feed-forward deep NNs, and it paves the way to expand recent lines of research that rely on Gaussian infinite wide limits.Comment: 25 pages, 3 figure

    Deep Stable neural networks: large-width asymptotics and convergence rates

    Get PDF
    In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotic properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and Gaussian stochastic processes (SPs). Motivated by empirical analyses that show the potential of replacing Gaussian distributions with Stable distributions for the NN’s weights, in this paper we present a rigorous analysis of the large-width asymptotic behaviour of (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stabledistributed weights. We show that as the width goes to infinity jointly over the NN’s layers, i.e. the “joint growth” setting, a rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN’s layers. Because of the non-triangular structure of the NN, this is a non-standard asymptotic problem, to which we propose an inductive approach of independent interest. Then, we establish sup-norm convergence rates of the rescaled deep Stable NN to the Stable SP, under the “joint growth” and a “sequential growth” of the width over the NN’s layers. Such a result provides the difference between the “joint growth” and the “sequential growth” settings, showing that the former leads to a slower rate than the latter, depending on the depth of the layer and the number of inputs of the NN. Our work extends some recent results on infinitely wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates in the “joint growth” setting

    Large-width functional asymptotics for deep Gaussian neural networks

    Full text link
    In this paper, we consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Extending previous results (Matthews et al., 2018a;b; Yang, 2019) we adopt a function-space perspective, i.e. we look at neural networks as infinite-dimensional random elements on the input space RI\mathbb{R}^I. Under suitable assumptions on the activation function we show that: i) a network defines a continuous Gaussian process on the input space RI\mathbb{R}^I; ii) a network with re-scaled weights converges weakly to a continuous Gaussian process in the large-width limit; iii) the limiting Gaussian process has almost surely locally γ\gamma-H\"older continuous paths, for 0<γ<10 < \gamma <1. Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and Gaussian processes by establishing weak convergence in function-space with respect to a stronger metric

    Predictive Constructions Based on Measure-Valued Pólya Urn Processes

    Get PDF
    Measure-valued Pólya urn processes (MVPP) are Markov chains with an additive structure that serve as an extension of the generalized k-color Pólya urn model towards a continuum of pos- sible colors. We prove that, for any MVPP (μn)n0 (\mu_n)_{n ≥ 0} on a Polish space X \mathbb{X} , the normalized sequence (μn/μn(X))n0 ( \mu_n / \mu_n (\mathbb{X}) )_{n \ge 0} agrees with the marginal predictive distributions of some random process (Xn)n1 (X_n)_{n \ge 1} . Moreover, μn=μn1+RXn, n1 \mu_n = \mu_{n − 1} + R_{X_n}, \ n \ge 1 , where xRx x \mapsto R_x is a random transition kernel on X \mathbb{X} ; thus, if μn1 \mu_{n − 1} represents the contents of an urn, then X n denotes the color of the ball drawn with distribution μn1/μn1(X) \mu_{n − 1} / \mu_{n − 1}(\mathbb{X}) and RXn R_{X_{n}} - the subsequent reinforcement. In the case RXn=WnδXn R_{X_{n}} = W_n\delta_{X_n} , for some non-negative random weights W1, W2,  W_1, \ W_2, \ ... , the process (Xn)n1 ( X_n )_{n \ge 1} is better understood as a randomly reinforced extension of Blackwell and MacQueen’s Pólya sequence. We study the asymptotic properties of the predictive distributions and the empirical frequencies of (Xn)n1 ( X_n )_{n \ge 1} under different assumptions on the weights. We also investigate a generalization of the above models via a randomization of the law of the reinforcement

    Prediction-based uncertainty quantification for exchangeable sequences

    No full text
    Prediction has a central role in the foundations of Bayesian statistics and is now the main focus in many areas of machine learning, in contrast to the more classical focus on inference. We discuss that, in the basic setting of random sampling - that is, in the Bayesian approach, exchangeability - uncertainty expressed by the posterior distribution and credible intervals can indeed be understood in terms of prediction. The posterior law on the unknown distribution is centered on the predictive distribution and we prove that it is marginally asymptotically Gaussian with variance depending on the predictive updates, i.e. on how the predictive rule incorporates information as new observations become available. This allows to obtain asymptotic credible intervals only based on the predictive rule (without having to specify the model and the prior law), sheds light on frequentist coverage as related to the predictive learning rule, and, we believe, opens a new perspective towards a notion of predictive efficiency that seems to call for further research
    corecore